Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization

نویسندگان

چکیده

Adaptivity is an important yet under-studied property in modern optimization theory. The gap between the state-of-the-art theory and current practice striking that algorithms with desirable theoretical guarantees typically involve drastically different settings of hyperparameters, such as step size schemes batch sizes, regimes. Despite appealing results, divisive strategies provide little, if any, insight to practitioners select work broadly without tweaking hyperparameters. In this work, blending “geometrization” technique introduced by [L. Lei M. I. Jordan, Proceedings 20th International Conference on Artificial Intelligence Statistics, 2017, pp. 148--156] SARAH algorithm Nguyen, J. Liu, K. Scheinberg, Takáč, 34th Machine Learning, 2613--2621], we propose geometrized for nonconvex finite-sum stochastic optimization. Our proved achieve adaptivity both magnitude target accuracy Polyak--Łojasiewicz (PL) constant, present. addition, it achieves best-available convergence rate non-PL objectives simultaneously while outperforming existing PL objectives.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and speedup properties, mainly due to the nonconvexity of most deep learning formulations and the asynchronous parallel mechanism. To fill the gaps in theory and provi...

متن کامل

Stochastic Recursive Gradient Algorithm for Nonconvex Optimization

In this paper, we study and analyze the mini-batch version of StochAstic Recursive grAdient algoritHm (SARAH), a method employing the stochastic recursive gradient, for solving empirical loss minimization for the case of nonconvex losses. We provide a sublinear convergence rate (to stationary points) for general nonconvex functions and a linear convergence rate for gradient dominated functions,...

متن کامل

Fast Stochastic Methods for Nonsmooth Nonconvex Optimization

We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonconvex part is smooth and the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of this fundamental problem is very limited. For example, it is not known whether the proximal stochastic gradient method with constant minibatch converges to a stationary point. To tack...

متن کامل

Block stochastic gradient iteration for convex and nonconvex optimization

The stochastic gradient (SG) method can minimize an objective function composed of a large number of differentiable functions or solve a stochastic optimization problem, very quickly to a moderate accuracy. The block coordinate descent/update (BCD) method, on the other hand, handles problems with multiple blocks of variables by updating them one at a time; when the blocks of variables are (much...

متن کامل

Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization

This paper considers a class of constrained stochastic composite optimization problems whose objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a certain non-differentiable (but convex) component. In order to solve these problems, we propose a randomized stochastic projected gradient (RSPG) algorithm, in which proper mini-batch of samp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: SIAM journal on mathematics of data science

سال: 2022

ISSN: ['2577-0187']

DOI: https://doi.org/10.1137/21m1394308